Online Monitoring of Sourdough Fermentation Using a Gas Sensor Array with Multivariate Data Analysis

Sourdough can improve bakery products’ shelf life, sensory properties, and nutrient composition. To ensure high-quality sourdough, the fermentation has to be monitored. The characteristic process variables for sourdough fermentation are pH and the degree of acidity measured as total titratable acidity (TTA). The time- and cost-intensive offline measurement of process variables can be improved by utilizing online gas measurements in prediction models. Therefore, a gas sensor array (GSA) system was used to monitor the fermentation process of sourdough online by correlation of exhaust gas data with offline measurement values of the process variables. Three methods were tested to utilize the extracted features from GSA to create the models. The most robust prediction models were achieved using a PCA (Principal Component Analysis) on all features and combined two fermentations. The calibrations with the extracted features had a percentage root mean square error (RMSE) from 1.4% to 12% for the pH and from 2.7% to 9.3% for the TTA. The coefficient of determination (R2) for these calibrations was 0.94 to 0.998 for the pH and 0.947 to 0.994 for the TTA. The obtained results indicate that the online measurement of exhaust gas from sourdough fermentations with gas sensor arrays can be a cheap and efficient application to predict pH and TTA.


Introduction
Sourdough is one of the oldest examples of natural starters, mainly used for making fermented baked goods as an alternative to baker's yeast and chemical leavening.Sourdough fermentation is a unique tool for improving the rheology, sensory properties, shelf life, and nutrient composition of gluten-free formulation [1].The quality and properties of sourdough depend on many technological and ecological influences.The most critical factors for sourdough production are contributed by the contents and enzymatic activity of the cereal, the controllable process parameters, and the microflora of lactic acid bacteria, yeasts, and other microorganisms [2].Sourdough fermentation is characterized by the process variables' pH, which is essential for inhibiting enzyme activity, and the degree of acidity measured as total titratable acidity (TTA), which accounts for the evaluation of the sensory properties.
The ability of gas sensors to measure specific gases like O 2 , CO 2 , and hydrogen sulfides which are linked to spoilage is a critical topic with respect to the attributes of quality, freshness, and safety conditions [3].Fermentation monitoring uses gas sensor arrays that combine specific gas sensor signals with pattern recognition [4].The sensor is set between qualitative (i.e., improve sensory attributes) or quantitative (i.e., monitoring) measurement [5] depending on the research goal.Research on fermentation monitoring with electronic nose techniques splits into submerged and solid-state fermentation.The submerged fermentation is relevant to our research, which includes all fermentations in the presence of excess water.Here, the electronic nose mainly uses fingerprints obtained from odor [1].Pinheiro et al. [6] investigated aroma production with an electronic nose for monitoring the fermentation process by using the unspecific sensor signal corresponding to ethanol concentration.Zhang et al. [7] investigated fermentation monitoring by alcoholic quantification with the use of an electronic nose as well as near-infrared spectroscopy.Several studies use gas sensor arrays to monitor fermentation, like Genzardi et al. [8] and Oikonomou et al. [9].Monitoring these process variables offline is a cost-and timeconsuming procedure.Grote et al. [10] used fluorescence spectroscopy to monitor the sourdough fermentation process online.In the study by Bolarinwa et al. [11], the influence of processing conditions on the levels of pH and TTA was determined in rice sourdough.They developed a prediction model that could predict the response of pH and TTA.A cheap and effective alternative can be monitored with a gas sensor array (GSA) system.With increased performance and availability, the properties of GSA, like a fast assessment of headspace and the ability for quantitative representation of gas mixtures, bode well for monitoring tasks.This proved to be especially useful for microbial fermentation monitoring with the analysis of exhaust gas [12].The challenge of measuring pH during sourdough fermentation is that it consists of liquid and solid parts that can influence the electrode.A reliable result can only be achieved by diluting a sample.Therefore, predicting critical measurement parameters like pH and acidity with a gas sensor array (GSA) system can be a cheap and effective alternative.Electronic noses have been widely established for the determination of the chemical composition in fermented foods and beverage application [13,14].Here, they are used for sensory evaluation, for example, by identifying essential aroma compounds that are responsible for staling of bread [15] or qualitative analysis to detect product adulteration like substituting corn and rice syrup in honey [16].However, due to their unspecificity, they need to improve their ability for quantitative analysis.Using soft sensor models improves the effectiveness and possible application areas for GSA systems.Many approaches to fermentation monitoring with soft sensors, including data processing techniques, such as multiple least square support vector machine, neural network, deep learning, fuzzy logic, and probabilistic latent variable models, have been collected by Zhu et al. [17].Viejo et al. [18] showed that with machine learning, it was possible to build highly accurate and precise models to determine the type of wheat and the volatile components of sourdough bread.Mei et al. [19] used a practical soft sensor modeling approach that combines PCA and Gaussian process regression to predict the biomass concentration in a fermentation better than a neural network and support vector machine model.
The previous studies correlate the gas sensor output to certain process variables or aroma characteristics.These studies visualize the kinetics of the associated target.This paper presents a GSA system for evaluating the online prediction capability of the process variables pH and TTA.Our approach aims to create prediction models that allow intrapolation and, therefore, enable automating monitoring tasks for sourdough fermentation.This should be achieved by feeding the models large amounts of data on different temperature and flour type instances.Therefore, the prediction models should be ubiquitously usable for sourdough fermentation.We carried out experiments with three different temperatures and various types of flour to investigate the performance of our analysis approach.Consequently, we provide the following contributions:

•
A multivariate data analysis approach for the sourdough fermentation process.

•
A correlation of features from the online GSA measurement values with offline measurements of the process variables.

•
Creation of prediction models with a parametric regression approach.
The remainder of this paper is structured as follows.In Section 2, we explain how sourdough fermentation works, describe the gas sensor array system, and illustrate the procedure for the experiments.Section 3 shows the obtained results.Finally, Section 4 discusses the achieved results, and Section 5 puts them into context and shows further possibilities.

Methods and Materials
First, Section 2.1 explains the sourdough fermentation process.After, we describe the working principle and application of the gas sensor array system in Section 2.2.Last, Section 2.3 illustrates the procedure for the experiments.

Sourdough Fermentation
Sourdough is a fermented dough with microorganisms, primarily lactic acid bacteria and active or reactivated yeasts.The acidification of the dough is only obtained by fermentation.During the fermentation, sugars are cleaved into carbon dioxide, which is incorporated in the dough to increase its volume and small amounts of alcohol and aroma components, which include lactic and acetic acid.Fresh sourdough is initiated by mixing flour and water and leavening it at a warm temperature.After 12-24 h, spontaneous fermentation leads to an acidic and alcoholic odor of the mixture.The TTA can provide a simple estimate of the total acid content but cannot differentiate the acids within the food sample.Whereas the TTA is a better predictor of acid's impact on flavor than pH, the pH can better describe how well microorganisms can grow in a food matrix due to the dependence on hydronium ion concentration [1].Hence, both measurements might be relevant to control the fermentation process.
Besides the dough acidity, the dough yield (DY) is an essential parameter for characterizing sourdoughs.It describes the dough consistency as the ratio between water and flour in the dough.It is calculated with the following formula [20] (see Equation ( 1)): The same DY in sourdoughs does not mean that they have the same consistency, because different flours have different abilities to absorb water.Generally, sourdoughs with a DY of 150-160 have a firm consistency, and doughs with a DY of 200 begin to show a liquid consistency [20].Also, the acidification rate increases with higher DY due to the enhanced diffusion of components in the dough with increased fluidity.Faster acidification of the dough means that the fermentation times are reduced as well [21].In this work, we investigated different doughs based on different flour types and different DY characteristics.

Gas Sensor Array System
We measured online exhaust gas with a self-assembled measurement system [22].Figure 1 shows the setup for the GSA system.The electricity for the GSA setup is provided by a multifunction AC/DC-voltage source (1).The exhaust gas reaches the GSA setup from the connection tube (2) of the fermenter.The exhaust gas is led to the gas chamber (3), which is also connected to a gas flow meter (4) that receives oxygen from a gas flask that is connected by a tube (5) to the setup.The signal measured in the gas chamber is transferred to the Arduino mega 2560 (6) that forwards the signal to the Matlab script on the connected laptop.
Figure 2 provides a schematic view of the GSA system and how it is integrated into the fermentation setup.The measurement system contains two main parts: the headspace sampling system and the measurement chamber.The headspace sampling procedure consisted of an automated sequence of internal operations.First, the headspace samples of the fermenter are pumped past the measurement chamber for 10 s at a flow rate of 600 mL/min with a diaphragm pump (Schwarzer Precision) every five minutes.The measurement chamber has a volume of 250 mL and contains a gas sensor array equipped with commercially available metal oxide semiconductor (MOS) gas sensors (TGS 822, TGS 813, and MQ3).The chamber is flushed with pure oxygen to regenerate the sensors in the next step.Due to the filling and flushing of the measurement chamber, peak-shaped measurement signals are obtained every 5 min.The analog measurement signal was converted to a digital signal by a microcontroller and forwarded to the computer interface, where it was integrated and processed with a prepared script with Matlab.[22] and its integration into the fermentation setup.
The gas sensor output is used as the source for the independent variables, while pH and TTA are the dependent variables whose relation to the independent variable should be predicted.We analyzed the raw signal from the gas sensor arrays (TGS 822, TGS 813, and MQ3) in a Matlab script designated to extract the peak height and area of the peak in each five-minute interval.From the feature extraction, six independent variables (i.e., 3 sensors * 2 features per sensor) were obtained.Figure 3 visualizes the feature extraction from the raw data for the whole fermentation of F5.Three different methods for the determination of the independent variables were used.The independent variables were inserted in the same procedure to implement a process model.For this, the measurement values of the dependent and independent variables were used in a process regression that correlates the inputs to supply parameters for predicting the behavior of pH and acidity for their respective fermentation.The established parameters were tied to the corresponding variables in the regression formula.With Excel's solver function, the parameters were minimized, expressing the correlation of the corresponding variables.Every five-minute interval, a reference point was determined that would be used for the error calculation.Due to fermentation runtimes of 10 h, an initial vector of 120 values was created for every fermentation.Each model estimation was evaluated by calculating the sum of squared errors (SSE), the root mean square error (RMSE), and the percentage error (% Error) of the RMSE that adjusted the error to the range of values.Additionally, regression curves correlating the predicted values to the measured values were created, and a coefficient of determination (R 2 ) was assigned.Three different methods for the determination of the independent variables were used and are explained separately in the following: 1.
The sensor features are filtered for the time corresponding to the taking of the offline sample (dependent variable).The offline values for pH and acidity and the corresponding GSA outputs were used as inputs for the regression equations.Two regression equations were established, one for each feature the regression was based on.
For the sensor features, the feature values were adjusted by subtracting the baseline value of the GSA measurement from the feature value.The regression equation for the peak height and peak area regression are shown in Equations ( 2) and ( 3): with C as the predicted value for the dependent variable, Kn as the regression parameters, BL as the baseline, PH for peak height, and PA for peak area.After evaluating the sensor features separately and combined in our calibrations, we decided to use peak height because it delivered better results.We refer to this method as the sensor signal method.

2.
In the second method, the independent variables were determined using a PCA script on the six extracted features.The features were transformed into two principal components, with the values along the main axis as the output.This reduced the dimensionality of the six features as a collection of variables while maintaining the same length of values in the data matrix.Analogous to the first approach, the regression model was created using the transformed values of the two principal components as independent variables.We refer to this method as the PCA regression method.The regression equation is shown in Equation ( 4): For the third method, the raw data were split into datasets corresponding to the peakshaped five-minute intervals extracted from the Matlab script.The feature extraction was not executed because enough data points per dataset had to remain for further data analysis.In the next step, a PCA script continuously analyzes all intervals of one fermentation to assign a score for each interval.The offline data were interpolated for each interval to correspond to the eigenvalues.The eigenvalues were analogous to the first approach as the independent variable for creating the process models.Only one principal component was considered an input for the regression model, because the first principal component had an explained variance over 99.5%.Therefore, the second principal component would add noise to the process model.Still, the initial vector for the model evaluation contains 120 values due to the transformation of the 5 min intervals.We refer to this method as the interval method.Equation ( 5) shows the corresponding regression formula.
For certain combinations of process models, we performed a validation by inserting the model parameters of one model into the regression equation of a validation set.The ability of the model parameters to predict the behavior of another process model is evaluated using SSE, RMSE, and the percentage model error.

Experiment Design
This section describes the procedure and the design of our experiments for analyzing the performance of the gas sensor array system.The appendix provides an overview of the used instruments and materials (cf.Appendix A).
The sourdough was prepared with three different flours: two rice flours-one of them a white flour (Heimatsmühle) and the other a wholegrain flour (Heimatsmühle)-and a white wheat flour (Rettenmaier Mühle).As a starter, the "Reinzucht-Sauerteig Reis" from Böcker was used.The moisture content was about 58%, and the pH was about 3.7.It was stored at 4 °C-6 °C.
A total of 16 fermentations, named F1 to F16, were carried out to provide a variance of the fermentation conditions.The experimental design measured each flour twice at 28 °C and 32 °C with a DY of 200.For the validation, each flour was additionally measured at 30 °C with a DY of 200.Table 1 shows the used flour and starter batch labels.Flour A was measured thrice at 28 °C, because it was unclear whether enough data points were available after the GSA system crashed.Table 2 shows a measurement scheme with the temperature and flour combinations.
Figure 4 shows the experimental set-up of the sourdough fermentation.To heat the stainless-steel fermenter, (1) a water bath (2) (Fisions) was connected to the water inlet and outlet connections of the fermenter and set to the desired temperature.The outlets on the side of the fermenter were sealed; the lowest outlet (3) was intended to take offline samples of the sourdough.The lid to close the fermenter on the top side was connected with an impeller mixer (4) (Hydro-Mec) and an engine (5).The mixer was set to the second speed level.Two of three outlets were covered on the lid and one connected to the gas sensor module (6) that leads the exhaust gas from the fermenter to the gas sensor.
At the beginning and every consecutive hour of each fermentation, the pH and the TTA were measured offline.The pH and TTA were measured with a pH meter (Xylem) and pH electrodes (VWR, Xylem).In the appendix, we describe the procedures for measuring pH and TTA in detail (cf.Appendix B).These reliable manual measurements a using standard procedure provide a reference for the calibration, using the values measured with the gas sensor array system.To illustrate the fermentation process, we included Figure A1 with the change in pH and TTA from F9 in the Appendix A.

Results
In what follows, we present the results of the analysis of the measurements with the gas sensor.

Process Model Evaluation
To analyze the process models' accuracy, we measured the error evaluation for SSE, RMSE, and RMSE percentage error.The results are shown in the order of signal sensor, PCA regression, and interval methods.
pH Sensor Signal Method.Table 3 shows the results of the pH sensor signal method.The features of peak height and peak area were carried out as separate process models.After testing the combinations of the two features on several fermentations, this approach was disregarded due to higher errors than the single-feature models.Except for fermentations F14 and F15 of the pH sensor signal method, each percentage error stayed under 10%.Comparing the peak height with the peak area model showed that the peak height model had a lower error rate.Due to the adjustment of the percentage error to the range of values, a comparison of the pH and TTA model errors was possible.Although the TTA model does not have outliers like fermentations F14 and F15 from the pH model, its average error is higher than the pH models.PCA Regression Method.The process models for the PCA regression method were carried out by grouping the 28 °C and 32 °C fermentations of one flour type in their temperature and combining them.The grouping of the fermentations was decided after detecting that the error of the grouped data resulted in lower error rates than the singlefermentation models.The grouped fermentations are titled in Table 4 as the abbreviations for the flour type (A, B, and C) and temperature, in which 28 °C and 32 °C contain the two respective fermentations.The combination is indicated as the abbreviation with an asterisk (*), and the two fermentations from each temperature are used in parentheses.The results of the error calculation for the PCA regression model show mostly errors under 10% with few exceptions, namely the temperature combination of A with a percentage error of 16% for TTA and the 32 °C grouped data from B with a percentage error of 10% for TTA.The last outlier was the temperature combination of C with a percentage error of 12% for pH.
Interval Method.The last method to be evaluated for the model's error is the interval method.Table 5 presents the model errors for fermentations F1 to F16.The interval method had three major percentage model errors in F7 and F11, with errors over 15%.Model errors between 10% and 15% were detected in F2, F3, F7, F12, F14, and F16.Every model with a major pH or TTA error also had at least 10% to 15% in the other criteria.This was not the case if a model had an error between 10% and 15%.The grouped fermentations for the interval method in Table 6 are titled in the same way as the ones from the PCA regression method in Table 4.The grouped temperature models showed a higher error than the single-fermentation models.Especially, the 32 °C model of flour type B and the 28 °C model of flour type C had percentage errors of over 15%.

Coefficient of Determination (R 2 )
The determination of the R 2 was only possible for the sensor signal models, because the integration of the PCA in the modulation methods caused a higher scattering of the predicted values.The determined R 2 values for the sensor signal models of fermentation F1 to F16 are shown in Table 7.
The R 2 values for the peak height models were generally higher than the ones for the peak area models.The comparison of pH and TTA models showed that none had consistently better values of R 2 .

Validation of the Models
The validation was carried out by inserting the model parameters of one model in the regression equation of another model.For example, F4 with F8 means that the model parameters from F8 were inserted in the regression equation of F4.Tables 8-10 show the SSE, RMSE, and RMSE percentage model error of the validation models for the three different model methods.
pH Sensor Signal Method.The validations for the sensor signal method were only carried out for the peak height feature due to a lower error rate than the peak area feature.To evaluate if the models can describe the process within their flour type and temperature, the model parameters of these fermentations were inserted into each other.The validations showed a high percentage error, except for the model parameters of F4 and F10.In comparison, the percentage errors for TTA are lower than for pH.
PCA Regression Method.The validations for the PCA regression method were carried out by inserting the model parameters of the grouped temperature models into the 30°C calibration of their respective flour type.
The validations for the PCA regression method had a low error rate than the validations of the sensor signal method within their temperature.The percentage error is lower in the validations for the pH than it is for the TTA validations.
Interval Method.The validations for the interval method were based on examining different combinations between the grouped temperatures and the 30 °C validation temperature.The interval method showed a high scattering effect of the predicted values compared with the other methods.The model parameters had a high range of −200 to 200, which made a precise estimation of the validation sets difficult.Further validations were disregarded due to their high error rate.

Discussion
The results of the measurements using our evaluation settings indicate that for both pH and TTA, the best validation predictions were obtained by the PCA regression method.We interpret and discuss the results in detail in this section.Further, we explain the identified threats to validity.

Offline Data of Sourdough Fermentation
We performed offline measurements in the laboratory to confirm the results measured with the GSA system.The pH and TTA values behaved mostly as expected.For the pH values, each fermentation, except for F4, F12, and F14, had a sigmoid downward trend opposed to the growth of the microorganisms.The other ones showed a more parabolic behavior.This could have resulted from a faster or slower accommodation of the starter microorganisms in these fermentations.The TTA values showed an almost linear increase.The different flour types and temperatures influenced the fermentation as supposed.A higher temperature resulted in a faster process regarding pH and TTA.The starting pH of the white flours was lower than the wholegrain flour, while the starting TTA of the wholegrain flour was higher than the white flours.This could be explained by the degree of grinding and the state of the compounds in the flours.From the high degree of grinding, the white flours exhibit damaged starch molecules.These release more directly fermentable substrates that contribute to faster pH lowering.On the other hand, wholegrain flour still contained more protein and enzymes that exhibit a buffering effect [23].Bolarinwa et al. [11] investigated the influence of temperature and fermentation time on pH and TTA by creating a prediction model using response surface methodology.In comparison, we varied temperature and flour types to increase the variance for the calibration input and then examined the prediction performance for our target variables.The R 2 for their model is 0.88 for pH and 0.887 for TTA, while our models range from 0.94 to 0.998 for pH and 0.947 to 0.994 for TTA.Other evaluation criteria were not stated and cannot be compared.
The HPLC results contribute insight into the metabolic processes during the fermentations.Each flour type had a different composition of sugars (glucose and maltose), but the microorganisms mainly converted glucose.This was visible by examining the change in concentrations of maltose between the wholegrain rice flour and the wheat flour.The wholegrain flour showed a low maltose concentration and was still not completely converted.Conversely, wheat flour showed a high concentration of maltose, but its concentration did not change drastically.In contrast, the concentration of glucose increased and decreased during the fermentation.The assumption that the content of organic acids was higher in wholegrain flour than in white flour was confirmed by the HPLC.Due to the lower grinding of the flour, more enzymes were available to convert a broader range of substrates.The results follow this trend.

Process Models
The process models for the sensor signal method had an adjusted percentage error of less than 10% for all models of the fermentations, except the ones for F14 and F15 of the peak area.They had a high coefficient of determination for predicting their fermentation but showed high errors for the validation with their paired temperature of the same flour type.The PCA regression method had low errors for the grouped temperature models and, compared with the sensor signal method, lower error values for the validations.The interval method contributed mixed error rates for the individual fermentation models and a percentage error of over 15%, except for one combination for the validations that had been carried out.
The aim for the validation errors was to be less than 10%; these values were mainly not reached.The best results for the validation models were achieved with the PCA regression method.Several possibilities could cause the validation of the methods not to satisfy the error requirements for the broad range of process models.The first reason is tied to the sensor signal and PCA regression method, because the interval method did not use feature extraction.The feature extraction of the peak area included a certain amount of noise.By adjusting the script further on the steps that capture the interval, less noise could be incorporated in the peak area feature.A second reason can be found in the GSA system.The already mentioned system crashes during the fermentations led to data gaps in the gas sensor online data.These gaps were carried over to later operations like feature extraction or PCA.There, the predicted values deviate noticeably after the data gap.This, in turn, impacts the error rate of the process model.The third reason is the high scattering of the PCA values omitted from the interval method.In this method, scores were assigned for every five-minute interval.Still, due to the similarity of the interval inputs from the raw data, the PCA contributed slight variance inside the range of the intervals.It must also be considered that the influence of the fermentation temperature changes the composition of the gas phase from the exhaust gas and, therefore, the signal response of the sensors.This could affect the prediction ability of models that combine data from different fermentation temperatures.To ensure the ability to monitor the process online, the PCA can be integrated into the data collection during measurement.In this work, the methods were evaluated on the measured data, but with a verified method, the data can be processed online without time delay.
Many studies use soft sensor models in fermentation monitoring.Mei et al. [19] proposed a multimodel method using Gaussian process regression and PCA to construct a soft sensor for fermentation processes to estimate biomass concentration.Similar to our approach, they used PCA to extract features and then integrated them into regression models.While our approach extracted the principal components to implement different methods into the parametric regression models, their approach calculates weights from submodel variance to combine into a final prediction model.We combined a data-driven approach with local models for the best prediction performance.But there are no approaches to building soft-sensing (pH/TTA) monitoring models for sourdough fermentation, so we cannot compare our results directly.

Threats to Validity
One limitation is the applicability of the results regarding other flour types and their influence on the final product.With supporting measurements, e.g., rheology for the characterization of the sourdough, it would become apparent if the models are applicable to sourdoughs from different flour types.Similarly, sensory evaluation has yet to be carried out to verify if products from the monitored sourdough would satisfy the requirements of consumers.These steps can be implemented to continue this work to ensure the relevance of the developed models.To justify replacing the methods used to measure pH and TTA with our models, we need to compare the estimated errors of traditional methods (i.e., using a pH electrode to measure the pH and TTA) with the errors that occur in our models.While the error of the pH measurement device is supposed to have an accuracy of ±0.1 pH units, it is still influenced by several factors like fermentation conditions, electrode calibration, preparation, and withdrawal of the samples.An estimated error of 5% for our model, which specific validations achieved, can be achieved by tuning the method in accuracy with more input data and a streamlined method.This would allow our method to be in the same margin as the traditional method while being less susceptible to errors occurring during the measurement.The model error at 5% would still be higher than the error of the traditional method.However, the increase in accuracy by tuning cannot be estimated safely, which can result in higher or lower degrees of accuracy improvement.
To specify, the results of the process models could have been improved by carrying out more fermentations with the same substrate and temperature to build a sufficient data foundation for the model creation and further data analysis operations.In this vein, using wheat flour instead of a third rice flour was too ambitious.An experimental plan with a third rice flour or more fermentations with the same substrates and temperatures might have led to more balanced and coherent datasets.With more extensive training and validation datasets, there are opportunities to use machine learning operations like neural networks that improve the prediction capability of the GSA.A reordering of the measurement data can improve the robustness of the models by making the features invariant to temperature and flour type.Using neural networks with GSA already proved successful in the contribution from Omatu and Yano [24].By applying neural networks on time series data from GSA, they achieved a classification rate of 89% to 96% for tea and coffee odors.
One important factor to consider is the temperature drift caused by the temperature change in the environment.This greatly affects the precision and measurement stability of the gas sensors.As a solution, the approach by Xu et al. [25] can be used in future experiments.They proposed a compensation training method based on random forest, which improved the accuracy of the GSA by about 1%.
The use of the GSA system to predict process variables is promising, and with the research and implementation of the correct methods and tools, it can be a practical and easily implementable sensor.A possible idea for the future use of the GSA system would be to measure fermentations at the same process conditions for different time durations.With the help of a forecast algorithm, parameters can be adjusted, and an automated signal loop could be established.Tudu et al. [26] showed that a forecasting approach for the peak prediction of a black tea fermentation process is possible.They used a similar GSA system set-up to detect a peak representing the optimal fermentation time.It will be more challenging to align the total time series data to the optimization goal than to find a specific optimization peak.Still, with the variety of tools in the machine learning field, it is reasonable to accomplish.Using the GSA system could also deliver enough data for modeling the fermentation process as a digital twin [27].

Conclusions
For ensuring high-quality sourdough, monitoring the fermentation is essential.Relevant characteristics include pH and the degree of acidity measured as TTA.A time-and cost-intensive offline measurement of these variables can be avoided by utilizing online gas measurements in prediction models.In this paper, we describe a gas sensor array (GSA) system that can monitor the fermentation process of sourdough online.We used the obtained data to correlate the gas data with offline measurement values of the process variables.Three methods were tested to use the extracted features from GSA to create the prediction models.
The results indicate that the online measurement of exhaust gas from sourdough fermentation with gas sensor arrays can be a cheap and efficient application to predict pH and TTA.The work also showed that the data must be processed thoroughly and with a suitable method to achieve proper prediction performance.
In comparison with other approaches of fermentation monitoring, this approach needs just a simple and cheap set-up.The analysis of the data is conducted during the fermentation in real time.Measuring the variables (especially the pH values) directly might result in initially lower errors of measurements.Further, these approaches require less training data and, hence, less fermentations for generating training data have to be carried out.Moreover, the advantage of our approach is that the error can be decreased by training, and a noninvasive online fermentation monitoring model can be implemented.
The further steps to continue this work first include an analysis of a prototype that applies the PCA with real-time data.As the scope of this work was to identify suitable analysis approaches, this has not yet been carried out.Second, we plan to increase fermentation measurements to include machine learning operations in the model development reliably and add supporting measurements to characterize sourdough from different sources and at specific process parameters.Furthermore, a sensory evaluation must be added to guarantee the quality of bread products from the monitored sourdough.

Figure 1 .
Figure1.Set-up of the gas sensor array (GSA) system: The GSA contains a DC/AC converter (1), a tube connection from the oxygen gas cylinder (2), the gas measurement chamber (3), a flow meter (4), a tube connection from the bioreactor (5), and the Arduino microcontroller(6).

Figure 2 .
Figure 2. Schematic diagram of the GSA system[22] and its integration into the fermentation setup.

Figure 3 .
Figure 3. Peak height and area values from the feature extraction of MQ3 of F5.

Table 1 .
Labels for the used flours and starter batches.

Table 2 .
Experimental design for the fermentation measurements of temperature, flour type, and starter batch at a starter amount of 15% and DY of 200.

Table 3 .
Calculated SSE, RMSE, and percentage model errors of the pH sensor signal method for fermentations F1 to F16.

Table 4 .
Calculated SSE, RMSE, and percentage model errors of the PCA regression method for the grouped 28 °C, 32 °C, and combined fermentations (assigned with the asterisk) from one flour type.

Table 5 .
Calculated SSE, RMSE, and percentage model errors of the interval method for fermentations F1 to F16.

Table 6 .
Calculated SSE, RMSE, and percentage model errors of the interval method for the grouped 28 °C and 32 °C fermentations from one flour type.

Table 7 .
Results for R 2 from the pH and TTA sensor signal method for fermentations F1 to F16.

Table 8 .
Results of SSE, RMSE, and percentage error for the validations of the sensor signal method from the fermentations of each flour type with the same temperature.

Table 9 .
SSE, RMSE, and percentage error results for validating the PCA regression method from the grouped temperature models in the 30 °C validation set (combined fermentations assigned with the asterisk from one flour type).

Table 10 .
Results of the SSE, RMSE, and percentage error for the validation of the interval method.